The authors propose OMG-Seg, a single transformer-based model that can perform over 10 different image and video segmentation tasks effectively, including semantic, instance, panoptic, open-vocabulary, interactive, and video object segmentation. It uses a shared encoder-decoder architecture to reduce overhead.